--- ---
Josef Fruehwald
11/2/2017
There are primary uses of statistical graphics for reporting our data.
gams)For many readers (and the field), what is communicated in graphs will endure in their memory and understading of your work longer than what is written in the text.
Without Points
Figure XX: A plot of normalized F1 over speakers’ year of birth. Each point represents the average normalized F1 for each speaker for each following place of articulation, with penalized cubic regression splines overlaid.
When the raw data points inhibit legibility too much.
Figure XX: Estimated normalized F1 over speakers’ year of birth. Penalized cubic regression splines fit over speaker-level averages.
It’s crucially important to describe how averages were estimated. You could lump all the data together:
Or, you could average within speakers first, then within place.
How these three methods look in R code.
ay0 %>%
filter(place %in% c("apical", "labial", "velar"))%>%
group_by(place) %>%
summarise(dur = mean(dur))%>%
mutate(method = "total pooling")->pool1ay0 %>%
filter(place %in% c("apical", "labial", "velar"))%>%
group_by(place,idstring) %>%
summarise(dur = mean(dur))%>%
summarise(dur = mean(dur))%>%
mutate(method = "speaker pooling")-> pool2ay0 %>%
filter(place %in% c("apical", "labial", "velar"))%>%
group_by(place,idstring, word) %>%
summarise(dur = mean(dur))%>%
summarise(dur = mean(dur))%>%
summarise(dur = mean(dur))%>%
mutate(method = "speaker & word pooling")-> pool3It can make a big difference.
Averages were estimated by first estimating word-level averages within speakers, then speaker level averages within place of articulation, then global averages for place of articulation.
Figures display incremental averages with words nested within speakers nested within place of articulation.
Sometimes, it is advisable to visualize the output of a statistical model.
library(lme4)
ay0_tomod <- ay0 %>%
filter(fol_seg %in% c("P", "T", "K"))%>%
mutate(dob = (dob-1950)/10,
logdur = log2(dur),
c_dur = (logdur - median(logdur)),
fol_seg = factor(fol_seg, levels = c("T","P","K")))
ay0_mod <- lmer(F1_n ~ dob * fol_seg + c_dur +
(fol_seg + c_dur | idstring) + (1|word),
data = ay0_tomod)
summary(ay0_mod)## Linear mixed model fit by REML ['lmerMod']
## Formula: F1_n ~ dob * fol_seg + c_dur + (fol_seg + c_dur | idstring) +
## (1 | word)
## Data: ay0_tomod
##
## REML criterion at convergence: 23170.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -6.1998 -0.5782 -0.0426 0.5231 10.2801
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## idstring (Intercept) 0.03198 0.1788
## fol_segP 0.12939 0.3597 -0.07
## fol_segK 0.03345 0.1829 -0.44 0.28
## c_dur 0.01141 0.1068 0.18 -0.21 0.02
## word (Intercept) 0.06031 0.2456
## Residual 0.20644 0.4544
## Number of obs: 17340, groups: idstring, 326; word, 182
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.520403 0.033874 15.363
## dob -0.100968 0.005632 -17.926
## fol_segP 0.096594 0.089014 1.085
## fol_segK 0.060603 0.064540 0.939
## c_dur 0.314384 0.010448 30.090
## dob:fol_segP -0.001226 0.020677 -0.059
## dob:fol_segK -0.030786 0.006194 -4.970
##
## Correlation of Fixed Effects:
## (Intr) dob fl_sgP fl_sgK c_dur db:f_P
## dob 0.074
## fol_segP -0.343 -0.010
## fol_segK -0.494 -0.027 0.189
## c_dur -0.042 0.086 -0.040 0.009
## dob:fol_sgP -0.004 -0.109 0.148 0.008 -0.015
## dob:fol_sgK -0.044 -0.582 0.019 0.043 -0.037 0.177
One recommended approach for getting confidence intervals is to do bootstrap replication. Explaining how and why bootstrap replication works is a bit beyond what I can do here. But see here.
ay0_boot <- bootMer(ay0_mod,
FUN = fixef,
nsim = 500,
type = "parametric",
use.u = T)We estimated confidence intervals based on 500* parametric bootstrap replications.
Figure XX: Parameter plot visualizing a subset of coefficient estimates and 95% bootstrap confidence intervals
Figure XX: Fitted values from 500 parametric bootstrap replicates.
With the increasing use of non-linear modelling methods, it’ll be important to start improving our inference methods from figures.
NO!
Figure XX. The confidence intervals overlap at all points so there is no difference.
YES!
Difference curves can be gotten using get_difference() from the itsadug package!
Figure XX. Estimated difference curve between /ayk/ and /ayt/
But be judicious in making overly strong inferences!